Knowledge Discovery and Data Mining (KDD-2003)

نویسندگان

  • Sašo Džeroski
  • Luc De Raedt
چکیده

Feature selection is an important issue for any learning algorithm, since reduced feature sets lead to an improvement in learning time, reduced model complexity and, in many cases, a reduced risk of overfitting. When performing feature selection for RAM-based learning algorithms, we typically assume that the cost of accessing each feature is uniform. In multirelational data mining, especially when data are to be held in a relational database management system (RDBMS), this is no longer the case. The dominant cost in such a setting is the scan of a relation, so that the cost of using a feature from a relation that needs to be scanned anyway is comparatively small, whereas adding a feature from a relation that has not been used before is high. This means that existing work on feature selection using the uniform cost assumption may not be applicable in a disk-based setting. In this paper, we report the results of a case study that extends prior work on multirelational feature selection, in particular, in the context of a boosting algorithm. As shown by our study, using the previously developed strategies on average leads to larger numbers of relations that need to be considered and loaded into memory, and thus higher cost in a disk-based setting. Instead, a simple relation-oriented strategy can be used to minimize cost of accessing additional relations. We describe experimental results to show how this basic strategy interacts with the feature selection variants proposed previously, and show that significant gains are made even in a main-memory setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Tool for Support of the Kdd Process

This paper presents basic ideas and results of the GOAL project focusing on its knowledge discovery part. Within this project a KDD (Knowledge Discovery in Databases) package [7] has been designed and implemented. In this paper motivation, architecture, functionality and one two of the implemented DM (data mining) modules of the KDD Package are described in greater detail. KDD Package supports ...

متن کامل

Knowledge discovery and data mining in biological databases

The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the dom...

متن کامل

Experiences of Using a Quantitative Approach for Mining Association Rules

In recent years interest has grown in “mining” large databases to extract novel and interesting information. Knowledge Discovery in Databases (KDD) has been recognised as an emerging research area. Association rules discovery is an important KDD technique for better data understanding. This paper proposes an enhancement with a memory efficient data structure of a quantitative approach to mine a...

متن کامل

Knowledge Discovery In Databases Process

This book presents recent advances in Knowledge discovery in databases (KDD) with a focus on the areas of market basket database, time-stamped databases. Outline. 1 Knowledge Discovery in Databases. Introduction. Definitions of KDD. 2 The KDD process. Steps of KDD. Discovery goals. Mining Methodologies. Knowledge discovery in databases (KDD) is an important activity, and require only to process...

متن کامل

Knowledge Discovery and Data Mining: Towards a Unifying Framework

This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe finks between data milfing, knowledge discovery, and other related fields. We then define the KDD process and basic data mining algorithms, discuss application issues and conclude with an analysis of challenges facing practitioners in the field.

متن کامل

Guest editorial data mining and knowledge discovery with evolutionary algorithms

DATA mining (DM) consists of extracting interesting knowledge from real-world, large and complex data sets; and is the core step of a broader process, called knowledge discovery from databases (KDD). In addition to the DM step, which actually extracts knowledge from data, KDD process includes several preprocessing (data preparation) and postprocessing (knowledge refinement) steps. The goal of d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003